Reflection

Throughout this module, our team was tasked with developing a relational database solution for ADR Logistics, a transportation company seeking to centralize and better utilize its operational data. At the outset, our aim was to convert fragmented datasets relating to vehicle status, maintenance, trips, and sensor data into a normalized, query-efficient database using SQL. The project also provided a practical context for exploring a broad range of big data topics that are increasingly relevant to data science work in the transportation industry.

One of the most important skills developed during this project was a rigorous approach to data wrangling, which included extraction, cleaning, normalization, and loading. Early on, I learned to recognize the practical challenges presented by raw datasets. Using the logistics maintenance dataset from Kaggle, I applied tools like pandas in Python to explore, validate, and clean the data. Standardizing column formats, handling missing or inconsistent records, and typecasting fields to appropriate SQL data types became regular parts of my workflow. These tasks reinforced the idea that a significant portion of real-world data science work involves making data usable before any meaningful analysis or modelling can take place. In the context of transportation, where sensor readings, logs, and manual inputs often come with quality issues, these foundational skills have proven directly relevant to both my academic and industry work.

As the project progressed, our team focused on database normalization. By working through First, Second, and Third Normal Forms, we built a schema that minimized redundancy, ensured referential integrity, and made data relationships transparent. This activity highlighted the importance of clear entity-relationship modelling and schema design, especially in environments where scalability and efficient queries are essential. The hands-on SQL implementation and enforcement of primary and foreign key constraints provided direct experience in maintaining data integrity, an issue that is crucial in transportation analytics where bad data can lead to costly operational mistakes.

Another aspect that emerged during the project was the need for robust compliance and security practices. While the initial proposal briefly acknowledged these issues, our final deliverables showed a more comprehensive approach. By addressing GDPR-style requirements, such as access controls, audit trails, and data encryption, we planned for regulatory challenges that are only becoming more pressing in data-driven industries. These considerations have influenced my approach in current professional projects, encouraging me to view security and compliance as essential elements of database and pipeline design, not just afterthoughts.

I also benefited from a comparative analysis of different database paradigms. While SQL provided the right mix of structure and query performance for our use case, evaluating NoSQL and NewSQL options broadened my understanding of how to match database technologies to specific operational requirements. This awareness will be helpful for future work where unstructured or rapidly evolving data needs to be processed alongside relational datasets.

The collaborative nature of the project meant that I had to contribute to team deliverables, such as Python data cleaning scripts, SQL schema documentation, and technical writeups. Regular communication and peer review were necessary to keep our group on track and to ensure that each deliverable met the required standard. I learned the value of early stakeholder engagement and regular check-ins, which helped us clarify requirements and address problems before they became major obstacles. These habits are essential in modern data teams, both in academic and professional settings.

From the range of topics covered in the Deciphering Big Data module, I found particular value in the sections on risk management, pipeline development, and evaluating the trade-offs between security, efficiency, and usability. Working with real-world transportation data highlighted common limitations, such as inconsistent sensor streams and the risks of incomplete data. Learning to assess and mitigate these risks, and to document processes clearly, has strengthened my ability to deliver reliable solutions in a big data environment.

In terms of learning outcomes, this project has allowed me to apply a full spectrum of data science skills. I have demonstrated the ability to identify and address security issues and limitations, select and use the right tools for cleaning and organizing large datasets, and contribute effectively as a member of a remote development team. My individual contributions included exploring and validating the dataset, creating data cleaning workflows in Python, designing and documenting the database schema, and ensuring compliance and security standards were addressed in our solution. By reflecting on both my own performance and that of the team, I have developed a better understanding of how to balance technical requirements with practical constraints and collaborative workflows.

Looking ahead, I intend to carry these lessons into future academic and professional work. The technical approaches and teamwork habits established during this module are directly applicable to ongoing challenges in the transportation industry, where the volume, variety, and velocity of data continue to grow. I will continue to prioritize rigorous data wrangling, clear documentation, regulatory compliance, and open communication with team members and stakeholders.

References:
Connolly, T. & Begg, C. (2015). Database systems: a practical approach to design, implementation, and management. 6th edn. Harlow: Pearson Education.
Kaggle (n.d.). Logistics Vehicle Maintenance History Dataset. https://www.kaggle.com/datasets/datasetengineer/logistics-vehicle-maintenance-history-dataset
Laudon, K.C. & Laudon, J.P. (2020). Management Information Systems: Managing the Digital Firm. 16th edn. Harlow: Pearson.
Rolfe, G., Freshwater, D. & Jasper, M. (2001). Critical reflection in nursing and the helping professions: a user’s guide. Basingstoke: Palgrave Macmillan.